智能论文笔记

Deep Learning on Multimodal Sensor Data at the Wireless Edge for Vehicular Network

Batool Salehi , Guillem Reus-Muns , Debashri Roy , Zifeng Wang , Tong Jian , Jennifer Dy , Stratis Ioannidis , Kaushik Chowdhury

分类：机器学习

2022-01-12

在车辆场景中的毫米波链路的光束选择是一个具有挑战性的问题，因为所有候选光束对之间的详尽搜索都不能在短接触时间内被确认完成。我们通过利用像LIDAR，相机图像和GPS等传感器收集的多模级数据来解决这一问题。我们提出了可以在本地以及移动边缘计算中心（MEC）本地执行的个人方式和分布式融合的深度学习（F-DL）架构，并研究相关权衡。我们还制定和解决优化问题，以考虑实际的光束搜索，MEC处理和传感器到MEC数据传送延迟开销，用于确定上述F-DL架构的输出尺寸。在公开的合成和本土现实世界数据集上进行的广泛评估结果分别在古典RF光束上释放出95％和96％的束选择速度提高。在预测前10个最佳光束对中，F-DL还优于最先进的技术20-22％。

translated by 谷歌翻译

PRONTO: Preamble Overhead Reduction with Neural Networks for Coarse Synchronization

Nasim Soltani , Debashri Roy , Kaushik Chowdhury

分类：机器学习

2021-12-20

在IEEE 802.11基于WiFi的波形中，接收器使用称为传统短训练场（L-STF）的前导码的第一字段执行粗略的时间和频率同步。 L-STF占据前导码长的40％，占用的通话时间为32美元。通过降低通信开销的目标，我们提出了一种修改的波形，通过消除L-STF来降低前导码长度。为了解码这种修改的波形，我们提出了一种被称为PRONTO的机器学习（ML）方案，其使用其他前导字段执行粗略时间和频率估计，特别是传统的长训练字段（L-LTF）。我们的贡献是三倍：（i）我们展示了Pronto，用于数据包检测和粗CFO估计的定制卷积神经网络（CNN），以及用于稳健训练的数据增强步骤。（ii）我们提出了一种广义决策流程，使PRONTO与包括标准L-STF的传统波形兼容。（iii）我们从软件定义的无线电（SDR）的测试平面上验证了空中WiFi数据集的结果。我们的评估表明，PRONTO可以以100％的精度执行数据包检测，并且粗略CFO估计，误差小于3％。我们证明Pronto提供高达40％的前导码减少，没有误码率（BER）劣化。最后，我们通过通过相应的CPU实现，通过GPU并行化进行实验地显示通过GPU并行化实现的加速。

translated by 谷歌翻译

Colosseum: Large-Scale Wireless Experimentation Through Hardware-in-the-Loop Network Emulation

Leonardo Bonati , Pedram Johari , Michele Polese , Salvatore D'Oro , Subhramoy Mohanti , Miead Tehrani-Moayyed , Davide Villa , Shweta Shrivastava , Chinenye Tassie , Kurt Yoder

分类：人工智能

2021-10-20

Colorsseum是一种开放式和公开可用的大型无线无线测试，可通过虚拟化和软载波形和协议堆栈进行实验研究，在完全可编程的“白盒子”平台上。通过256最先进的软件定义的无线电和巨大的通道仿真器核心，罗马斗兽场几乎可以模拟任何方案，在各种部署和渠道条件下，可以在规模上进行设计，开发和测试解决方案。通过有限脉冲响应滤波器通过高保真FPGA的仿真再现这些罗马孔射频场景。过滤器模拟所需的无线通道的抽头，并将它们应用于无线电节点生成的信号，忠实地模拟现实世界无线环境的条件。在本文中，我们将罗马斗兽场介绍为测试楼，这是第一次向研究界开放。我们描述了罗马斗兽场的建筑及其实验和仿真能力。然后，我们通过示例性用例证明了罗马斗兽场对实验研究的有效性，包括频谱共享和无人空中车辆场景的普遍用途用例，包括普遍的无线技术（例如，蜂窝和Wi-Fi）。斗兽索斗兽场未来更新的路线图总结了这篇论文。

translated by 谷歌翻译

Internet of Things: Digital Footprints Carry A Device Identity

Rajarshi Roy Chowdhury , Azam Che Idris , Pg Emeroylariffion Abas

分类：机器学习

2023-01-01

The usage of technologically advanced devices has seen a boom in many domains, including education, automation, and healthcare; with most of the services requiring Internet connectivity. To secure a network, device identification plays key role. In this paper, a device fingerprinting (DFP) model, which is able to distinguish between Internet of Things (IoT) and non-IoT devices, as well as uniquely identify individual devices, has been proposed. Four statistical features have been extracted from the consecutive five device-originated packets, to generate individual device fingerprints. The method has been evaluated using the Random Forest (RF) classifier and different datasets. Experimental results have shown that the proposed method achieves up to 99.8% accuracy in distinguishing between IoT and non-IoT devices and over 97.6% in classifying individual devices. These signify that the proposed method is useful in assisting operators in making their networks more secure and robust to security breaches and unauthorized access.

translated by 谷歌翻译

Blind Restoration of Real-World Audio by 1D Operational GANs

Turker Ince , Serkan Kiranyaz , Ozer Can Devecioglu , Muhammad Salman Khan , Muhammad Chowdhury , Moncef Gabbouj

分类：机器学习

2022-12-30

Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.

translated by 谷歌翻译

Packing Privacy Budget Efficiently

Pierre Tholoniat , Kelly Kostopoulou , Mosharaf Chowdhury , Asaf Cidon , Roxana Geambasu , Mathias Lécuyer , Junfeng Yang

分类：机器学习

2022-12-26

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.

translated by 谷歌翻译

LMFLOSS: A Hybrid Loss For Imbalanced Medical Image Classification

Abu Adnan Sadi , Labib Chowdhury , Nursrat Jahan , Mohammad Newaz Sharif Rafi , Radeya Chowdhury , Faisal Ahamed Khan , Nabeel Mohammed

分类：计算机视觉 | 人工智能

2022-12-24

Automatic medical image classification is a very important field where the use of AI has the potential to have a real social impact. However, there are still many challenges that act as obstacles to making practically effective solutions. One of those is the fact that most of the medical imaging datasets have a class imbalance problem. This leads to the fact that existing AI techniques, particularly neural network-based deep-learning methodologies, often perform poorly in such scenarios. Thus this makes this area an interesting and active research focus for researchers. In this study, we propose a novel loss function to train neural network models to mitigate this critical issue in this important field. Through rigorous experiments on three independently collected datasets of three different medical imaging domains, we empirically show that our proposed loss function consistently performs well with an improvement between 2%-10% macro f1 when compared to the baseline models. We hope that our work will precipitate new research toward a more generalized approach to medical image classification.

translated by 谷歌翻译

Rank-LIME: Local Model-Agnostic Feature Attribution for Learning to Rank

Tanya Chowdhury , Razieh Rahimi , James Allan

分类：机器学习

2022-12-24

Understanding why a model makes certain predictions is crucial when adapting it for real world decision making. LIME is a popular model-agnostic feature attribution method for the tasks of classification and regression. However, the task of learning to rank in information retrieval is more complex in comparison with either classification or regression. In this work, we extend LIME to propose Rank-LIME, a model-agnostic, local, post-hoc linear feature attribution method for the task of learning to rank that generates explanations for ranked lists. We employ novel correlation-based perturbations, differentiable ranking loss functions and introduce new metrics to evaluate ranking based additive feature attribution models. We compare Rank-LIME with a variety of competing systems, with models trained on the MS MARCO datasets and observe that Rank-LIME outperforms existing explanation algorithms in terms of Model Fidelity and Explain-NDCG. With this we propose one of the first algorithms to generate additive feature attributions for explaining ranked lists.

translated by 谷歌翻译

Huruf: An Application for Arabic Handwritten Character Recognition Using Deep Learning

Minhaz Kamal , Fairuz Shaiara , Chowdhury Mohammad Abdullah , Sabbir Ahmed , Tasnim Ahmed , Md. Hasanul Kabir

分类：计算机视觉

2022-12-16

Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic characters and variations in writing styles across individuals makes the task even more challenging. We identified some probable reasons behind this and proposed a lightweight Convolutional Neural Network-based architecture for recognizing Arabic characters and digits. The proposed pipeline consists of a total of 18 layers containing four layers each for convolution, pooling, batch normalization, dropout, and finally one Global average pooling and a Dense layer. Furthermore, we thoroughly investigated the different choices of hyperparameters such as the choice of the optimizer, kernel initializer, activation function, etc. Evaluating the proposed architecture on the publicly available 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase)' datasets, the proposed model respectively achieved an accuracy of 96.93% and 99.35% which is comparable to the state-of-the-art and makes it a suitable solution for real-life end-level applications.

translated by 谷歌翻译

Acela: Predictable Datacenter-level Maintenance Job Scheduling

Yi Ding , Aijia Gao , Thibaud Ryden , Kaushik Mitra , Sukumar Kalmanje , Yanai Golany , Michael Carbin , Henry Hoffmann

分类：机器学习

2022-12-10

Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.

translated by 谷歌翻译